Mining Time-Changing Data Streams
نویسنده
چکیده
Streaming data have gained considerable attention in database and data mining communities because of the emergence of a class of applications, such as financial marketing, sensor networks, internet IP monitoring, and telecommunications that produce these data. Data streams have some unique characteristics that are not exhibited by traditional data: unbounded, fast-arriving, and time-changing. Traditional data mining techniques that make multiple passes over data or that ignore distribution changes are not applicable to dynamic data streams. Mining data streams has been an active research area to address requirements of the streaming applications. This thesis focuses on developing techniques for distribution change detection and mining time-changing data streams. Two techniques are proposed that can detect distribution changes in generic data streams. One approach for tackling one of the most popular stream mining tasks, frequent itemsets mining, is also presented in this thesis. All the proposed techniques are implemented and empirically studied. Experimental results show that the proposed techniques can achieve promising performance for detecting changes and mining dynamic data streams.
منابع مشابه
Incremental Mining of Across-streams Sequential Patterns in Multiple Data Streams
Sequential pattern mining is the mining of data sequences for frequent sequential patterns with time sequence, which has a wide application. Data streams are streams of data that arrive at high speed. Due to the limitation of memory capacity and the need of real-time mining, the results of mining need to be updated in real time. Multiple data streams are the simultaneous arrival of a plurality ...
متن کاملUpdating formulas and algorithms for computing entropy and Gini index on time-changing data streams
Despite growing interest in data stream mining the most successful incremental learners still use periodic recomputation to update attribute information gains and Gini indices. This note provides simple incremental formulas and algorithms for computing entropy and Gini index from time-changing data streams.
متن کاملSentiment Knowledge Discovery in Twitter Streaming Data
Micro-blogs are a challenging new source of information for data mining techniques. Twitter is a micro-blogging service built to discover what is happening at any moment in time, anywhere in the world. Twitter messages are short, and generated constantly, and well suited for knowledge discovery using data stream mining. We briefly discuss the challenges that Twitter data streams pose, focusing ...
متن کاملMining Frequent Patterns in Uncertain and Relational Data Streams using the Landmark Windows
Todays, in many modern applications, we search for frequent and repeating patterns in the analyzed data sets. In this search, we look for patterns that frequently appear in data set and mark them as frequent patterns to enable users to make decisions based on these discoveries. Most algorithms presented in the context of data stream mining and frequent pattern detection, work either on uncertai...
متن کاملPredicting Sequential Pattern Changes in Data Streams
Data streams are utilized in an increasing number of real-time information technology applications. Unlike traditional datasets, data streams are temporally ordered, fast changing and massive. Due to their tremendous volume, performing multiple scans of the entire data stream is impractical. Thus, traditional sequential pattern mining algorithms cannot be applied. Accordingly, the present study...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011